{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "792fbf58-9810-4096-9a07-c01ffd9cd4f4",
   "metadata": {},
   "source": [
    "## 创建dask arrays的几种方式"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f24b3526-7947-4d0b-a5c0-c86cd848e56d",
   "metadata": {},
   "source": [
    "创建dask arrays方式有很多，比如我们可以从HDF5, NetCDF, Zarr等支持numpy风格切片的文件中读取，或者使用from_array方法将一个array like的对象\n",
    "转成dask arrays，或者使用dask提供的“随机数”api生成。\n",
    "* from_array\n",
    "* from_delayed\n",
    "* from dataframe（略）\n",
    "* da.random（略）\n",
    "* interactions with NumPy arrays\n",
    "* 读取文件\n",
    "* 保存dask arrays\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "43283185-5f2a-46c9-aa52-2e87cfd31de6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import dask.array as da\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8ed6148-ba79-4aa1-939f-04689b807b5d",
   "metadata": {},
   "source": [
    "### from_array\n",
    "使用from_array方法将一个array like（Input must have a .shape, .ndim, .dtype and support numpy-style slicing.）的对象转成dask arrays（list也可以）。\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "20287e3d-131b-431c-be9a-cd121281a6d8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <td>\n",
       "            <table style=\"border-collapse: collapse;\">\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <td> </td>\n",
       "                        <th> Array </th>\n",
       "                        <th> Chunk </th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "                    \n",
       "                    <tr>\n",
       "                        <th> Bytes </th>\n",
       "                        <td> 80 B </td>\n",
       "                        <td> 40 B </td>\n",
       "                    </tr>\n",
       "                    \n",
       "                    <tr>\n",
       "                        <th> Shape </th>\n",
       "                        <td> (10,) </td>\n",
       "                        <td> (5,) </td>\n",
       "                    </tr>\n",
       "                    <tr>\n",
       "                        <th> Dask graph </th>\n",
       "                        <td colspan=\"2\"> 2 chunks in 1 graph layer </td>\n",
       "                    </tr>\n",
       "                    <tr>\n",
       "                        <th> Data type </th>\n",
       "                        <td colspan=\"2\"> int64 numpy.ndarray </td>\n",
       "                    </tr>\n",
       "                </tbody>\n",
       "            </table>\n",
       "        </td>\n",
       "        <td>\n",
       "        <svg width=\"170\" height=\"88\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
       "\n",
       "  <!-- Horizontal lines -->\n",
       "  <line x1=\"0\" y1=\"0\" x2=\"120\" y2=\"0\" style=\"stroke-width:2\" />\n",
       "  <line x1=\"0\" y1=\"38\" x2=\"120\" y2=\"38\" style=\"stroke-width:2\" />\n",
       "\n",
       "  <!-- Vertical lines -->\n",
       "  <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"38\" style=\"stroke-width:2\" />\n",
       "  <line x1=\"60\" y1=\"0\" x2=\"60\" y2=\"38\" />\n",
       "  <line x1=\"120\" y1=\"0\" x2=\"120\" y2=\"38\" style=\"stroke-width:2\" />\n",
       "\n",
       "  <!-- Colored Rectangle -->\n",
       "  <polygon points=\"0.0,0.0 120.0,0.0 120.0,38.596863036086 0.0,38.596863036086\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
       "\n",
       "  <!-- Text -->\n",
       "  <text x=\"60.000000\" y=\"58.596863\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >10</text>\n",
       "  <text x=\"140.000000\" y=\"19.298432\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(0,140.000000,19.298432)\">1</text>\n",
       "</svg>\n",
       "        </td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "dask.array<array, shape=(10,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 通过列表生成\n",
    "li = list(range(10))\n",
    "x_li = da.from_array(li, chunks=5)\n",
    "x_li"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "0bdca850-6309-4f12-a384-f691c9ecb35f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <td>\n",
       "            <table style=\"border-collapse: collapse;\">\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <td> </td>\n",
       "                        <th> Array </th>\n",
       "                        <th> Chunk </th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "                    \n",
       "                    <tr>\n",
       "                        <th> Bytes </th>\n",
       "                        <td> 80 B </td>\n",
       "                        <td> 40 B </td>\n",
       "                    </tr>\n",
       "                    \n",
       "                    <tr>\n",
       "                        <th> Shape </th>\n",
       "                        <td> (10,) </td>\n",
       "                        <td> (5,) </td>\n",
       "                    </tr>\n",
       "                    <tr>\n",
       "                        <th> Dask graph </th>\n",
       "                        <td colspan=\"2\"> 2 chunks in 1 graph layer </td>\n",
       "                    </tr>\n",
       "                    <tr>\n",
       "                        <th> Data type </th>\n",
       "                        <td colspan=\"2\"> int64 numpy.ndarray </td>\n",
       "                    </tr>\n",
       "                </tbody>\n",
       "            </table>\n",
       "        </td>\n",
       "        <td>\n",
       "        <svg width=\"170\" height=\"88\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
       "\n",
       "  <!-- Horizontal lines -->\n",
       "  <line x1=\"0\" y1=\"0\" x2=\"120\" y2=\"0\" style=\"stroke-width:2\" />\n",
       "  <line x1=\"0\" y1=\"38\" x2=\"120\" y2=\"38\" style=\"stroke-width:2\" />\n",
       "\n",
       "  <!-- Vertical lines -->\n",
       "  <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"38\" style=\"stroke-width:2\" />\n",
       "  <line x1=\"60\" y1=\"0\" x2=\"60\" y2=\"38\" />\n",
       "  <line x1=\"120\" y1=\"0\" x2=\"120\" y2=\"38\" style=\"stroke-width:2\" />\n",
       "\n",
       "  <!-- Colored Rectangle -->\n",
       "  <polygon points=\"0.0,0.0 120.0,0.0 120.0,38.596863036086 0.0,38.596863036086\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
       "\n",
       "  <!-- Text -->\n",
       "  <text x=\"60.000000\" y=\"58.596863\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >10</text>\n",
       "  <text x=\"140.000000\" y=\"19.298432\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(0,140.000000,19.298432)\">1</text>\n",
       "</svg>\n",
       "        </td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "dask.array<array, shape=(10,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 通过np.array生成\n",
    "import array\n",
    "arr = np.array(li)\n",
    "x_arr = da.from_array(arr,chunks=5)\n",
    "x_arr"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87a9ea6e-efc3-4a71-9214-dadae493d0d4",
   "metadata": {},
   "source": [
    "### from_delayed\n",
    "如果被转换的对象驻留在不是array like的状态中，比如使用了dask.delayed将其进行了延迟计算，这时我们可以使用from_delayed方法从一个delayed对象\n",
    "上创建dask arrays。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "0118cb09-6cd0-493e-b974-345b3c27fc56",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,\n",
       "       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,\n",
       "       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,\n",
       "       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,\n",
       "       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,\n",
       "       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 原始的da\n",
    "x = da.array(list(range(100)))\n",
    "x_dtype, x_shape = x.dtype, x.shape\n",
    "x.compute()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "43197bf1-74c3-4ab7-9fdb-c6fe0bc0f74c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Delayed('add_one-ea443ccadb0d5eba8e029037588d08dc')"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# da中的每一个值加一\n",
    "def add_one(da):\n",
    "    return da + 1\n",
    "    \n",
    "delayed_add_one = dask.delayed(add_one, pure=True)\n",
    "lazy_x = delayed_add_one(x)\n",
    "lazy_x\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "fbb113e5-004d-4911-97f1-2a796a93c4bf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <td>\n",
       "            <table style=\"border-collapse: collapse;\">\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <td> </td>\n",
       "                        <th> Array </th>\n",
       "                        <th> Chunk </th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "                    \n",
       "                    <tr>\n",
       "                        <th> Bytes </th>\n",
       "                        <td> 800 B </td>\n",
       "                        <td> 800 B </td>\n",
       "                    </tr>\n",
       "                    \n",
       "                    <tr>\n",
       "                        <th> Shape </th>\n",
       "                        <td> (100,) </td>\n",
       "                        <td> (100,) </td>\n",
       "                    </tr>\n",
       "                    <tr>\n",
       "                        <th> Dask graph </th>\n",
       "                        <td colspan=\"2\"> 1 chunks in 4 graph layers </td>\n",
       "                    </tr>\n",
       "                    <tr>\n",
       "                        <th> Data type </th>\n",
       "                        <td colspan=\"2\"> int64 numpy.ndarray </td>\n",
       "                    </tr>\n",
       "                </tbody>\n",
       "            </table>\n",
       "        </td>\n",
       "        <td>\n",
       "        <svg width=\"170\" height=\"75\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
       "\n",
       "  <!-- Horizontal lines -->\n",
       "  <line x1=\"0\" y1=\"0\" x2=\"120\" y2=\"0\" style=\"stroke-width:2\" />\n",
       "  <line x1=\"0\" y1=\"25\" x2=\"120\" y2=\"25\" style=\"stroke-width:2\" />\n",
       "\n",
       "  <!-- Vertical lines -->\n",
       "  <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"25\" style=\"stroke-width:2\" />\n",
       "  <line x1=\"120\" y1=\"0\" x2=\"120\" y2=\"25\" style=\"stroke-width:2\" />\n",
       "\n",
       "  <!-- Colored Rectangle -->\n",
       "  <polygon points=\"0.0,0.0 120.0,0.0 120.0,25.412616514582485 0.0,25.412616514582485\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
       "\n",
       "  <!-- Text -->\n",
       "  <text x=\"60.000000\" y=\"45.412617\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >100</text>\n",
       "  <text x=\"140.000000\" y=\"12.706308\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(0,140.000000,12.706308)\">1</text>\n",
       "</svg>\n",
       "        </td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "dask.array<from-value, shape=(100,), dtype=int64, chunksize=(100,), chunktype=numpy.ndarray>"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x_after = da.from_delayed(lazy_x, dtype=x_dtype, shape=x_shape)\n",
    "x_after"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "c285ad5d-b3b0-4de3-8fd9-23f3c4568ce4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,\n",
       "        14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,\n",
       "        27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,\n",
       "        40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,\n",
       "        53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,\n",
       "        66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,\n",
       "        79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,\n",
       "        92,  93,  94,  95,  96,  97,  98,  99, 100])"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x_after.compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9638a55e-78aa-4a58-ab38-4679910918ca",
   "metadata": {},
   "source": [
    "### Interactions with NumPy arrays\n",
    "np.array（或者是任何np.array like的对象）和da一些交互，交互后的结果会自动转为da。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "71c2a53c-42a8-4925-81df-707358db1966",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <td>\n",
       "            <table style=\"border-collapse: collapse;\">\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <td> </td>\n",
       "                        <th> Array </th>\n",
       "                        <th> Chunk </th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "                    \n",
       "                    <tr>\n",
       "                        <th> Bytes </th>\n",
       "                        <td> 8 B </td>\n",
       "                        <td> 8 B </td>\n",
       "                    </tr>\n",
       "                    \n",
       "                    <tr>\n",
       "                        <th> Shape </th>\n",
       "                        <td> () </td>\n",
       "                        <td> () </td>\n",
       "                    </tr>\n",
       "                    <tr>\n",
       "                        <th> Dask graph </th>\n",
       "                        <td colspan=\"2\"> 1 chunks in 3 graph layers </td>\n",
       "                    </tr>\n",
       "                    <tr>\n",
       "                        <th> Data type </th>\n",
       "                        <td colspan=\"2\"> float64 numpy.ndarray </td>\n",
       "                    </tr>\n",
       "                </tbody>\n",
       "            </table>\n",
       "        </td>\n",
       "        <td>\n",
       "        \n",
       "        </td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "dask.array<sum-aggregate, shape=(), dtype=float64, chunksize=(), chunktype=numpy.ndarray>"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 使用da对np.array进行求和\n",
    "x = da.sum(np.ones(5))\n",
    "x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "id": "3eefddf0-57a1-4f1f-9605-7b0f1303f0eb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <td>\n",
       "            <table style=\"border-collapse: collapse;\">\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <td> </td>\n",
       "                        <th> Array </th>\n",
       "                        <th> Chunk </th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "                    \n",
       "                    <tr>\n",
       "                        <th> Bytes </th>\n",
       "                        <td> 80 B </td>\n",
       "                        <td> 40 B </td>\n",
       "                    </tr>\n",
       "                    \n",
       "                    <tr>\n",
       "                        <th> Shape </th>\n",
       "                        <td> (10,) </td>\n",
       "                        <td> (5,) </td>\n",
       "                    </tr>\n",
       "                    <tr>\n",
       "                        <th> Dask graph </th>\n",
       "                        <td colspan=\"2\"> 2 chunks in 4 graph layers </td>\n",
       "                    </tr>\n",
       "                    <tr>\n",
       "                        <th> Data type </th>\n",
       "                        <td colspan=\"2\"> float64 numpy.ndarray </td>\n",
       "                    </tr>\n",
       "                </tbody>\n",
       "            </table>\n",
       "        </td>\n",
       "        <td>\n",
       "        <svg width=\"170\" height=\"88\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
       "\n",
       "  <!-- Horizontal lines -->\n",
       "  <line x1=\"0\" y1=\"0\" x2=\"120\" y2=\"0\" style=\"stroke-width:2\" />\n",
       "  <line x1=\"0\" y1=\"38\" x2=\"120\" y2=\"38\" style=\"stroke-width:2\" />\n",
       "\n",
       "  <!-- Vertical lines -->\n",
       "  <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"38\" style=\"stroke-width:2\" />\n",
       "  <line x1=\"60\" y1=\"0\" x2=\"60\" y2=\"38\" />\n",
       "  <line x1=\"120\" y1=\"0\" x2=\"120\" y2=\"38\" style=\"stroke-width:2\" />\n",
       "\n",
       "  <!-- Colored Rectangle -->\n",
       "  <polygon points=\"0.0,0.0 120.0,0.0 120.0,38.596863036086 0.0,38.596863036086\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
       "\n",
       "  <!-- Text -->\n",
       "  <text x=\"60.000000\" y=\"58.596863\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >10</text>\n",
       "  <text x=\"140.000000\" y=\"19.298432\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(0,140.000000,19.298432)\">1</text>\n",
       "</svg>\n",
       "        </td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "dask.array<add, shape=(10,), dtype=float64, chunksize=(5,), chunktype=numpy.ndarray>"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Automatic rechunking rules will generally slice the NumPy array into the appropriate Dask chunk shape\n",
    "x = da.ones(10, chunks=(5,))\n",
    "y = np.ones(10)\n",
    "z = x + y\n",
    "z"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "id": "2ea72e18-fa36-462e-90f3-6f003af79569",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2.])"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z.compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f80d4954-d626-4195-a324-fa12aced3cb6",
   "metadata": {},
   "source": [
    "## 写入dask arrays的几种方式"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "975b844e-5457-4bf3-b8a6-a7acf1102746",
   "metadata": {},
   "source": [
    "### 写入hdf5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7d09dabd-8d78-4984-88f7-5f1876077995",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.7rc1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
